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Abstract  -  To  achieve  the  extreme  accuracy  rates  de¬ 
manded  by  applications  in  unsupervised  automated  cy¬ 
tology,  it  is  frequently  necessary  to  supplement  the  pri¬ 
mary  segmentation  algorithm  with  a  segmentation  qual¬ 
ity  control  system.  The  more  robust  the  segmentation 
strategy,  the  less  severe  the  data  pruning  need  be  at 
the  segmentation  validation  stage.  These  issues  are  ad¬ 
dressed  as  we  describe  our  cell  nucleus  segmentation 
strategy  which  is  able  to  achieve  100%  accurate  segmen¬ 
tation  from  a  data  set  of  19946  cell  nucleus  images  by  au¬ 
tomatically  discarding  the  most  difficult  cell  images.  The 
automatic  quality  checking  is  applied  to  enhance  the  per¬ 
formance  of  a  robust  energy  minimisation  based  segmen¬ 
tation  scheme  which  already  achieved  a  99.47  %  accurate 
segmentation  rate. 

Keywords  —  cell,  cytology,  image,  segmentation,  ro¬ 
bust 

I.  Introduction 

Machine  vision  systems  for  the  unsupervised  automation 
of  otherwise  manual  tasks  usually  require  image  processing 
components  with  exceptionally  high  accuracy  rates.  This  is 
especially  true  in  the  biomedical  domain  where  failures  re¬ 
sult  in  mis-diagnoses.  The  fact  that  research  is  still  contin¬ 
uing  on  the  development  of  a  cervical  cancer  screening  ma¬ 
chine  despite  projects  being  initiated  in  the  1950’s  is  perhaps 
a  good  indication  of  the  magnitude  of  the  development  effort 
required  to  go  from  an  algorithm  obtaining  “good”  results  on 
a  small  test  data  set,  to  one  obtaining  acceptable  levels  of  ac¬ 
curacy  in  a  real  environment.  The  main  difficulty  with  this 
application  has  been  identified  as  the  robust  segmentation  of 
cells  and  cell  nuclei.  Indeed,  Bengtsson  [5]  says  that  seg¬ 
mentation  stage  is  “the  key  to  a  working  machine”  echoing 
the  sentiments  of  Gonzalez  and  Woods  [9]  that,  “effective 
segmentation  rarely  fails  to  lead  to  a  successful  solution.” 
Many  algorithms  have  been  proposed  in  the  past  with  vary¬ 
ing  degrees  of  success,  but  just  as  important  as  a  high  accu¬ 
racy  rate  is  knowing  when  a  failure  has  occurred  as  an 


erroneously  segmented  cell  is  much  worse  than  a  rejected 
cell’  [13]. 

Many  researchers  have  included  artefact  and  incor¬ 
rect  segmentation  rejection  schemes  in  their  algorithms. 
MacAulay  used  a  post-processing  step  after  segmentation  to 
remove  potential  artefacts  based  on  shape  and  appearance 
that  was  capable  of  detecting  some  of  the  incorrectly  seg¬ 
mented  nuclei  [11].  Nordin  describes  an  algorithm  that  is 
able  to  report  failures  at  various  stages  of  the  segmentation 
process,  as  well  as  a  separate  artefact  rejection  algorithm 
[13].  McKenna  made  use  of  a  neural  network  to  select  poten¬ 
tial  nuclei  in  scenes  for  subsequent  segmentation.  He  noted 
that  a  post-processing  stage  would  also  be  necessary  to  filter 
out  “erroneously  detected  objects”  [12]. 

A  common  theme  in  these  techniques  is  the  need  for  a 
separate  quality  control  process  to  view  the  output  of  the  seg¬ 
mentation  and  typically  these  apply  shape  and  appearance 
criteria  to  classify  the  results  as  either  “pass”  (looks  like  a 
cell)  or  “fail”  (doesn’t  look  like  a  cell).  We  have  developed  a 
segmentation  strategy  that  not  only  employs  a  segmentation 
algorithm  with  much  higher  performance  than  previously  re¬ 
ported  [4],  but  which  also  provides  a  confidence  measure 
in  the  resulting  segmentation  without  explicit  reference  to 
shape  and  appearance  criteria  for  quality  check  purposes. 

II.  The  Segmentation  Stage 

For  a  full  explanation  of  the  underlying  segmentation 
technique,  the  reader  is  referred  to  [4]  and  [1].  The  method  is 
based  on  energy  minimisation  techniques  and  is  summarised 
here  only  to  introduce  the  development  of  the  subsequent 
quality  checking  strategy. 

A.  Energy  Minimisation  Implementation 

The  use  of  active  contours  in  bio-medical  applications  is 
well  established,  but  it  is  well  known  that  these  methods 
tend  to  suffer  from  local  minima,  initialisation,  and  stop¬ 
ping  criteria  problems.  Fortunately  global  minimum  energy 
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searching  methods  have  been  found  to  be  particularly  effec¬ 
tive  in  avoiding  local  minima  problems  due  to  the  presence 
of  the  many  artefacts  often  associated  with  medical  images 
[6]  [7]  [8].  Here,  a  dynamically  programmed  search  method 
was  implemented  based  upon  a  suggestion  in  [10].  A  search 
space  is  first  set  up  within  the  image,  bounded  by  two  con¬ 
centric  circles  centralised  upon  the  approximate  centre  of  the 
nucleus  found  by  an  initial  rough  segmentation  technique 
(< e.g converging  squares  algorithm).  This  search  space  is 
sampled  to  form  a  circular  trellis  by  discretising  both  the  cir¬ 
cles  and  a  grid  of  evenly-spaced  radial  lines  joining  them 
(figure  1). 


Figure  1 .  Discrete  search  space 


Every  possible  contour  that  lies  upon  the  nodes  of  the 
search  space  is  then  evaluated  and  an  associated  energy  or 
cost  function  is  calculated.  This  cost  is  a  function  of  both 
the  contour’s  smoothness  and  how  closely  it  follows  image 
edges.  The  relative  weighting  of  the  cost  components  is  con¬ 
trolled  by  a  single  regularisation  parameter,  A  G  [0, 1].  By 
choosing  a  high  value  of  A,  the  smoothness  term  dominates, 
which  may  lead  to  contours  that  tend  to  ignore  important  im¬ 
age  edges.  On  the  other  hand,  low  values  of  A  allow  con¬ 
tours  to  develop  sharp  comers  as  they  attempt  to  follow  all 
high  gradient  edges,  even  those  which  may  not  necessarily 
be  on  the  desired  objects  edge.  Once  every  contour  has  been 
evaluated,  the  single  contour  with  least  cost  is  chosen  as  the 
global  solution.  The  well-known  Viterbi  algorithm  provides 
an  efficient  method  to  find  this  global  solution  as  described 
in  [4]. 

B.  Segmentation  Performance 

A  data  set  of  19946  Pap  stained  cervical  cell  images  was 
available  for  testing.  These  images  were  of  the  order  of 
128x128  pixels,  quantised  to  256  gray  levels  and  each  con¬ 
tained  a  single  nucleus. 

The  single  parameter  A  that  controls  the  behaviour  of  the 
algorithm,  was  empirically  chosen  to  be  0.7  after  trial  runs 
on  a  small  sub-set  of  the  images.  This  sub-set  was  made  up 
of  141  known  ‘difficult’  images  from  previous  studies  [4]  [3], 
augmented  by  a  random  sample  of  269  images  from  the  re¬ 
maining  data  set.  This  careful  data  selection  was  necessary 


as  previous  experience  showed  that  for  the  majority  of  im¬ 
ages,  the  resulting  segmentation  was  fairly  insensitive  to  the 
choice  of  A,  making  the  choice  of  optimum  value  difficult. 
Nevertheless,  more  demanding  images  require  some  adjust¬ 
ment  to  the  parameter  to  achieve  correct  segmentation.  The 
effect  of  the  choice  of  A  on  segmentation  accuracy  on  this 
trial  set  is  shown  by  the  graph  of  figure  2. 


Figure  2.  Plot  of  percentage  of  correct  segmentations  against 
A  for  a  set  of  images  consisting  of  known  ‘difficult’  images 
and  randomly  selected  images. 


With  A  set  at  0.0,  the  smoothness  constraint  is  completely 
ignored  and  the  point  of  greatest  gradient  is  chosen  along 
each  search  space  radius.  Previous  studies  [3]  have  shown 
that  for  approximately  65%  of  images,  all  points  of  greatest 
gradient  actually  lie  upon  the  nucleus  cytoplasm  border  (fig¬ 
ure  3(a)),  so  these  cell  images  will  be  correctly  segmented. 


(a)  (b)  (c) 

Figure  3.  A  =  0.0.  a)  Largest  gradients  occur  on  the  nucleus 
border,  b)  darkly  stained  chromatin  generates  largest  gradi¬ 
ents,  c)  dark  artefacts  generate  largest  gradients. 

For  the  remaining  35%  of  images,  a  large  gradient  due  to 
an  artefact  or  darkly  stained  chromatin  will  draw  the  contour 
away  from  the  desired  border  (figures  3(b)&(c)).  As  A  in¬ 
creases,  the  large  curvatures  present  in  these  configurations 
become  less  probable  (figure  4). 

The  graph  shows  a  value  of  A  =  0.7  as  the  most  suitable 
for  these  particular  images.  Every  image  in  the  data  set  was 


(a)  (b)  (c) 

Figure  4.  The  effect  of  increasing  A.  a)A  =  0.1,b)A  =  0.2, 
c)  A  =  0.5. 


then  segmented  at  A  =  0.7  and  the  results  verified  by  eye. 
Of  the  19946  images,  99.47%  were  found  to  be  correctly  seg¬ 
mented.  Three  main  classes  of  failure  were  identified.  Eighty 
seven  of  the  failures  were  due  to  the  nuclei  lying  close  to  the 
cytoplasm  boundary.  As  the  background  cytoplasm  bound¬ 
ary  contrast  is  much  greater  than  that  of  the  nucleus  cyto¬ 
plasm  boundary,  the  contour  tended  to  lie  upon  the  former 
very  low  image  energy  area  (high  gradient  edges).  Fourteen 
of  the  failures  were  caused  by  the  inappropriate  choice  of 
A  for  that  individual  image  (they  all  subsequently  produced 
correct  segmentations  with  different  values  of  A.)  The  re¬ 
maining  four  images  were  found  to  fail  at  all  attempts.  The 
failures  due  to  the  presence  of  the  background  in  the  nucleus 
images  are  preventable  through  careful  design  of  a  prior  cell¬ 
finding  stage  [2].  Here,  the  cytoplasm  background  boundary 
is  known  and  can  therefore  be  prevented  from  appearing  in 
the  nucleus  images.  The  detection  of  the  other  classes  of 
failure  is  therefore  the  major  issue. 

III.  Development  of  an  Error  Checking 
Framework 

Despite  the  exceptionally  high  accuracy  rate  that  the 
global  minimum  searching  contour  method  achieves,  there  is 
still  a  possibility  of  sample  contamination  from  the  few  fail¬ 
ures  that  do  occur.  In  order  to  prevent  this,  the  need  would 
still  exist  for  a  human  to  view  the  output  of  this  stage,  un¬ 
dermining  its  utility  in  a  practical  system.  The  remainder  of 
this  paper  therefore  concerns  itself  with  the  development  of 
a  framework  that  further  increases  the  accuracy  of  a  potential 
system. 

A.  Lambda  Sensitivity 

For  the  majority  of  relatively  simple  images  with  little 
ambiguity  in  the  true  location  of  the  nuclear  boundary,  the 
final  segmentation  can  be  fairly  insensitive  to  A  over  a  wide 
range  of  values  (figure  5). 

By  contrast,  ‘difficult’  images  (even  for  humans)  produce 


very  different  contours  depending  upon  the  choice  of  A  (fig¬ 
ures  4  and  6). 

These  images  usually  contain  artefacts  near  or  on  the  nu¬ 
clear  boundaries  that  make  the  ‘true’  border  hard  to  find. 
These  examples  show  that  no  single  value  of  A  is  capable 
of  accurately  segmenting  all  of  the  images.  Therefore,  rather 
than  segment  the  images  at  one  value  of  A  and  use  a  post¬ 
process  to  reject  possible  failures,  we  are  interested  in  view¬ 
ing  the  output  of  the  algorithm  for  various  values  of  A  in 
order  to  detect  stability  as  a  measure  of  confidence  in  the 
resulting  segmentation. 

B.  Error  Checking 

The  graph  of  figure  2  shows  monotonically  increasing 
segmentation  accuracy  for  0.0  <  A  <  0.7.  In  fact,  from  the 
data  it  was  observed  that  the  set  of  correct  segmentations  at 
Ai  was  a  strict  subset  of  the  set  of  correct  segmentations  at 
A2  where  Ai  <  A2  <  0.7.  Therefore,  by  segmenting  an  im¬ 
age  at  the  high  probability  of  correct  segmentation  value  of 
A  =  0.7  and  then  again  at  A  =  0.0,  similarity  between  the 
two  contours  indicates  a  high  level  of  contour  stability  (fig¬ 
ure  5).  This  image  is  then  classified  as  a  ‘very  easy’  image 
to  segment  and  for  convenience  labelled  “level  0”.  Lack  of 
similarity  leads  to  a  comparison  of  the  contour  at  A  =  0.7 
with  the  contour  at  A  =  0.1.  Similarity  leads  to  a  classifica¬ 
tion  of  level  1  and  so  on. 

This  classification  method  suggests  a  means  to  discard  in¬ 
correct  segmentations.  For  example,  if  we  keep  only  level  0 
(very  easy)  cell  images,  we  discard  approximately  a  third  of 
the  data  set,  but  achieve  a  100%  correct  segmentation  rate  on 
those  retained  [3]. 

IV.  Fine  Tuning 

In  order  to  pursue  this  method,  the  data  set  was  split  into 
two  sets:  JF,  Those  images  that  been  incorrectly  segmented 
at  A  =  0.7  (105  images)  and  C,  those  that  had  been  correctly 
segmented  (19841  images).  Statistics  were  then  measured 
for  each  level  by  comparing  the  segmentation  result  at  A  = 


hh 

(a)  (b)  (c) 

Figure  5.  Example  of  an  image  that  is  stable  over  a  range  of 
A.  (a)  A  =  0.1,  (b)  A  =  0.5,  (c)  A  =  0.7. 


Level 

0 

1 

2 

3 

4 

5 

6 

Threshold 

4.85 

3.20 

2.45 

0.79 

0.79 

0.79 

0.79 

Table  1 .  Minimum  MAD  thresholds  for  the  detection  of  ev¬ 
ery  element  in  T  (incorrect  segmentation  at  A  =  0.7  on  the 
test  data  set)  for  levels  0  -  6. 


0.7  with  those  at  A  =  0.0,  0.1, ...,  0.6  for  every  image  in  both 
sets. 

As  the  contours  to  be  compared  were  the  result  of  the 
same  algorithm  and  indeed  the  same  search  space  in  the  im¬ 
age,  the  comparison  between  any  two  contours  is  trivial.  The 
distance  between  each  chosen  point  on  each  of  the  search 
space  radii  (figure  1)  for  each  contour  was  calculated  and 
the  maximum  absolute  deviation  (MAD)  evaluated. 

A  cumulative  plot  of  the  percentage  of  the  set  T  against 
MAD  for  level  zero  (comparison  between  contours  at  A  = 
0.7  and  A  =  0.0)  is  shown  in  figure  7. 

This  graph  shows  that  for  level  0,  a  MAD  threshold  of 
4.84  pixels  would  detect  every  failed  segmentation.  In  a 
similar  manner,  it  is  possible  to  establish  thresholds  for  each 
level  so  that  the  detection  of  every  failed  segmentation  in  this 
database  is  guaranteed  (table  1). 

The  thresholds  decrease  with  increasing  level.  This  is  ex¬ 
pected  as  closer  values  of  A  are  compared  at  higher  levels. 
The  values  then  taper  to  a  limit  of  0.79  pixels  as  this  is  the 
distance  between  two  adjacent  radial  points  on  the  discrete 
search  space. 

In  order  to  establish  the  effect  of  setting  such  thresholds 
on  C,  the  percentage  of  C  that  would  be  discarded  against 
MAD  threshold  for  level  zero  is  shown  in  figure  8. 

Therefore,  by  setting  a  threshold  of  4.84  pixels  and  re¬ 
jecting  any  segmentation  with  a  greater  MAD,  40.2%  of  the 
correct  segmentations  would  be  discarded.  This  procedure 
may  be  repeated  for  each  level,  using  the  thresholds  previ¬ 
ously  calculated.  The  percentage  of  C  that  falls  above  the 
threshold  for  each  level  (i.e.  a  good  segmentation  being  dis¬ 
carded)  against  MAD  is  shown  in  figure  9. 


(a)  (b) 


Figure  6.  Example  of  an  image  that  is  not  stable  over  a  range 
of  A.  (a)  A  =  0.5,  (b)  A  =  0.7 


Figure  7.  A  plot  of  the  percentage  of  the  elements  of  T  (in¬ 
correct  segmentation  at  A  =  0.7  on  the  test  data  set)  against 
measured  MAD  for  level  0. 


Figure  8.  A  plot  of  the  percentage  of  elements  of  C  (cor¬ 
rect  segmentations  at  A  —  0.7  on  the  test  data  set)  rejected 
against  MAD  threshold  for  level  0. 


Although  a  harsher  threshold  is  used  at  level  1  than  at 
level  0,  fewer  correct  segmentations  are  discarded.  This  is 
due  to  the  absence  of  any  smoothness  constraint  at  A  =  0.0 
which  leads  to  the  wild  deviations  such  as  those  shown  in 
figure  3.  However,  the  small  smoothness  contribution  at 
A  =  0.1  corrects  many  of  these  deviations  resulting  in  the 
large  drop  in  average  MAD  (table  2). 

Therefore  by  running  at  level  2,  it  is  possible  to  detect  ev¬ 
ery  failure  and  only  discard  10.78%  of  the  correct  segmenta¬ 
tions. 

V.  Conclusions 

By  analysing  the  modes  of  failure  of  a  highly  success¬ 
ful  cell  nucleus  segmentation  algorithm,  an  error  checking 


Figure  9.  Plot  of  the  percentage  of  elements  of  C  (correct 
segmentations  at  A  =  0.7  on  the  test  data  set)  rejected  at 
each  level  using  the  thresholds  of  table  1 . 


Level 

0 

1 

2 

3 

4 

5 

6 

Average  MAD 

8.90 

2.78 

1.91 

1.56 

1.35 

1.22 

1.07 

Table  2.  Average  MAD  for  levels  0  -  6. 


framework  was  implemented  that  was  capable  of  detecting 
every  failure.  The  algorithm  parameter  A  was  first  empiri¬ 
cally  tuned  for  the  data  set  to  obtain  best  segmentation  ac¬ 
curacy.  It  was  then  observed  that  different  values  of  A  ob¬ 
tained  different  solutions  for  difficult  images,  but  simple  im¬ 
ages  generated  stable  solutions.  Therefore,  by  varying  A  this 
stability  could  be  detected.  A  decision  to  reject  or  accept  the 
segmentation  was  then  made,  based  upon  measured  thresh¬ 
olds  for  each  level.  For  the  data  set  of  19946  images,  it  was 
found  that  by  comparing  the  resulting  contours  at  values  of 
A  =  0.7  and  A  =  0.2,  and  rejecting  the  segmentation  if  the 
maximum  absolute  deviation  (MAD)  between  the  contours 
was  greater  than  2.45  pixels,  every  failure  could  be  detected 
whilst  only  discarding  10.78%  of  the  correct  segmentations. 
In  this  study,  only  values  of  A  with  a  resolution  of  0.1  have 
been  considered.  It  is  possible  that  by  increasing  this  resolu¬ 
tion  in  the  region  of  interest  (i.e.  near  ‘level  2’  operation)  and 
repeating  the  exercise,  a  further  increase  in  the  performance 
could  be  achieved.  Naturally,  the  parameters  and  results  that 
have  been  reported  are  optimised  not  only  for  one  type  of 
image  but  also  for  the  hardware  configuration  that  was  used 
to  capture  them.  Current  work  involves  the  incorporation  of 
the  proposed  system  into  a  fully  automated  Cytometer  (an 
automatic  imaging  system)  using  the  same  methodology  to 
achieve  optimal  performance  for  that  hardware.  This  allows 
for  much  more  extensive  analysis  of  the  proposed  methods 
through  the  accessability  of  a  greater  amount  of  data.  This 
result  has  great  potential  for  the  implementation  of  unsuper¬ 
vised  cancer  screening  devices  using  methods  where  only  a 


representative  sample  of  cells  is  required.  Preliminary  stud¬ 
ies  have  shown  strong  possibilities  for  the  non-invasive  early 
detection  of  lung  and  other  forms  of  cancer. 

Finally,  The  ‘rejected’  cells  have  simply  been  labelled  as 
such.  These  could  be  interpreted  as  having  been  ‘flagged’ 
by  the  algorithm  as  problematic,  requiring  processing  by  a 
higher  level  ( e.g to  invoke  a  different  algorithm  etc.)  By 
achieving  such  high  accuracy  rates  and  confidence  in  the  seg¬ 
mentation  stage,  the  following  feature  extraction  and  classi¬ 
fication  processes  can  only  become  more  robust. 
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